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A Class of DCT Approximations Based on the Feig-Winograd Algorithm 


C. J. Tablada* F. M. BayeF R. J. Cintra* 


Abstract 

A new class of matrices based on a parametrization of the Feig-Winograd factorization of 8-point DCT is pro¬ 
posed. Such parametrization induces a matrix subspace, which unifies a number of existing methods for DCT ap¬ 
proximation. By solving a comprehensive multicriteria optimization problem, we identified several new DCT approx¬ 
imations. Obtained solutions were sought to possess the following properties: (i) low multiplierless computational 
complexity, (ii) orthogonality or near orthogonality, (iii) low complexity invertibility, and (iv) close proximity and 
performance to the exact DCT. Proposed approximations were submitted to assessment in terms of proximity to the 
DCT, coding performance, and suitability for image compression. Considering Pareto efficiency, particular new pro¬ 
posed approximations could outperform various existing methods archived in literature. 
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1 Introduction 


The discrete cosine transform (DCT) is an essential tool in digital signal processing m- This is mainly because the 
DCT is asymptotically equivalent to the Karhunen-Loeve transform (KLT), which possesses optimal decorrelation and 
energy compaction properties Such good characteristics are attained when high correlated first-order Markov 

signals are considered 00- Importantly natural images belong to this particular class of signals Q. 

In particular, the DCT has been adopted in several image and video coding schemes |[7|, such as JPEG Q, MPEG- 
1 g, MPEG-2 1^, H.261 H.263 ACV/H.264 |[^, and the recent HEVC/H.265 0. In particular, H.264 

and HEVC standards employ 8-point DCT algorithms 114-^ among other different blocklenghts, such as 4, 16, and 
32 points 1^^. In |24|, the 8-point DCT stage of the HEVC was optimized. Because of this increasing demand, 
several algorithms for the efficient computation of the 8-point DCT have been proposed | 25p6 |. In |T]|2 1, comprehensive 
surveys on DCT algorithms are detailed. 


Noteworthy DCT methods include the following procedures: Wang factorization |27|, Eee DCT for power-of-two 
block lengths | [28[ , Aral DCT scheme Eoeffler algorithm p0| , and Eeig-Winograd factorization |31|. All these 
algorithms are classical results in the field and have been considered for practical applications l|9 ^ 331. Eor instance, 
the Arai DCT scheme was employed in various recent hardware implementations of the DCT [3^ 361. 

Indeed, after the introduction of the DCT by Ahmed et al. Q in 1974, designing efficient DCT algorithms has been 
a major scientific efforf in fhe circuifs, sysfems, and signal processing communify. Because of such infense research in 
the field, the current exact methods are very close to the theoretical DCT complexity |[5l|2^[30|[37l|^ . Therefore, it may 


*C. J. Tablada and R. J. Cintra are with the Signal Processing Group, Departamento de Estatlstica, Universidade Federal de Pernamhuco. 
E-mail: rjdsc@dsp.ufpe.org 

^F. M. Bayer is with the Departamento de Estatlstica and LACESM, Universidade Federal de Santa Maria, Brazil. 


1 































be unrealistic to expect that new exact algorithms could offer dramatic computational gains for such a fundamental and 
deeply investigated mathematical operation. 

In this scenario, approximate methods offer an alternative way to further reduce the computational complexity of 
the DCT l|^|^39 -42|. While not computing the DCT exactly, such approximations can provide meaningful estima¬ 
tions at very low-complexity computational requirements. In this sense, literature has been populated with approximate 
methods for the efficient computation of the DCT. For example, the AVC/H.264 and HEVC/H.265 standards employ 
integer approximate DCT in order to reduce the computational cost of the transform stage. A comprehensive list of 
approximate methods for the DCT is found in Q. Prominent methods include the signed DCT (SDCT) Q, the level 1 
approximation by Lengwehasatit-Ortega p^ , the Bouguezel-Ahmad-Swamy (BAS) series of algorithms | 4T]|42|44 -47), 
the DCT round-off approximation |[48|, the modified DCT round-off approximafion | |40| , and fhe mulfiplier-free DCT 


approximafion for radio-frequency (RF) mulfi-beam digifal aperfure-array space imaging |491. 

Alfhough several other types of approximations are available, in general, very low-complexity approximation ma¬ 
trices have their elements defined on fhe sef {0,±1/2,±1,±2} |[^40 42 47 48|. Thus, such fransformafions possess 
null mulfiplicafive complexify, because fhe required arifhmefic operafions can be implemenfed exclusively by means of 
binary addifions and bif-shiffing operafions. Indeed, DCT approximations can replace fhe exact DCT in hardware imple- 
menfafion and high-speed compufafion/processing Q, while having low hardware cosfs and low power demands |36|. 
Effectively, DCT approximafions have been considered for applications in real-lime video Iransmission and process¬ 
ing 1^^, salellile communicalion syslems Q, porlable computing applicafions Q, radio-frequency smart anfenna 
array | [49t , and wireless image sensor nelworks | [52| . 

The proposed mefhods archived in liferafure for generaling very low complexify approximafions include: (i) crude 
approximafions ||^ ^ 481; (ii) inspection pT]|45 461; (iii) variations of previous approximafions via a single-parameter 

. Thus, fhe exisfing approximafions appear 


malrix 1421; and (iv) opfimizafion procedures based on fhe DCT sfruclure 
as isolate cases wifhoul a unifying mathemafical formalism. 

The aim of this paper is two-fold. Our first goal is to introduce a new class of DCT approximations. Eor such end. 


we consider a parametrization of the Eeig-Winograd factorization of the 8-point DCT matrix |311. Such parametrization 
allows the constructions of a specific malrix subspace. Second, over fhe inlroduced subspace, we solve a conslrained 
mulficrileria opfimizafion problem fo identifying opfimal DCT approximafions according lo several figures of meril for 
image compression. Bofh orthogonal and non-orthogonal approximations are sought. 

The paper unfolds as follows. In Section we describe the mathematical structure of the proposed class of trans¬ 
forms, including fast algorithms for the direct and inverse transformations. Section [^discusses the desirable properties 
that approximate 8-point DCTs are expected to satisfy, such as low computational complexity, orthogonality, invertibility, 
and proximity to the exact DCT. In Section we formalize a multicriteria optimization problem based on a comprehen¬ 
sive set of performance measures in order to identify efficient solutions and new transformations over the proposed class 
of matrices. The resulting approximations are subject to assessment and extensive comparison with competing methods. 
In Section we perform a comprehensive image compression analysis considering the obtained optimal transforms, 
using image quality measures as figures of meril. Section [^concludes fhe paper. 
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2 Feig-Winograd Approximate DCT 


2.1 Preliminaries 

The DCT is algebraically represented by the N xN transformation matrix Cn whose elements are given by |[^|^: 


Cftin — 


2 / n(m — \ )(2n — 1) 

V 2N 


where m,n = 1,2,... ,N, po = Ijyfl, and jSj; = 1, for ^ 0. Let x = 


1 T 


XQ Xl 


Xn-1 


be an input vector, where 


the superscript denotes the transposition operation. The one-dimensional (1-D) DCT transform of x is the A-point 

1 T 


vector X = 


Xo Xl 


X 


-N-l 


given by X = • x. Because Cn is an orthogonal matrix, the inverse transformation 


can be written according to x = Cj/ ■ X. 

Let An and B^v be square matrices of order N. For two-dimensional (2-D) signals, we have the following relation¬ 
ships. The forward and inverse 2-D DCT operations are expressed by 


Bn = Cn- An ■ Cjf and An = CJ; - Bn -Cn, 


respectively. 

Although the procedures described in this work can be applied to any blocklength, we focus exclusively on the 
8-point DCT which is denoted as Cg and is given by: 


Yi n 73 Yi Ti Yi Ti 73 

Yo Yi 74 76 -76 -74 -n -"T) 

7i 75 -75 -7i -7i -75 75 7i 
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75 -7i 7i -75 -75 7i -7i 75 

76 -74 72 -7o 7o -72 74 -76-1 


where = cos (271 (k-I- l)/32). 



2.2 Feig-Winograd DCT Factorization 

In pT[ Feig and Winograd introduced a fast algorithm for the 1-D 8-point DCT, whose factorization can be given by: 

where Ps is a signed permutation matrix. Kg is a multiplicative matrix, and B^^^ Bg^^, and Bg^^ are symmetric additive 
matrices. These matrices are given by: 
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and 


-73 00 0 0 0 0 0- 

075 000000 

0 0 75 7i 0 0 0 0 

0 0 -7i 75 0 0 0 0 

0 0 0 0 -76 -74 -yi -7) ’ 

0 0 0 0 74 7o 76 -7> 

0 0 0 0 -7) 7! -74 76 

. 0 0 0 0 -7! -76 7) -74- 


where 1/ and 1/ denote the identity and counter-identity matrices of order /, respectively, and bdiag(-) is the block diagonal 
operator. 

Above factorization circumscribes the entire multiplicative complexity of the DCT into the block diagonal matrix Kg. 
Indeed, the seven distinct non-null elements of Kg, namely y;, / = 0,1,... ,6, are the only non-trivial quantities in Feig- 
Winograd algorithm. 


2.3 Feig-Winograd Matrix Mapping 


The Feig-Winograd factorization paves the way for defining a class of 8 x 8 matrices generated according to the following 
mapping: 


FW: 


OL I 




( 1 ) 


where cx. = 
and 


tto 


«! 



is a 7-point parameter vector, .^(8) is the 8x8 matrix space over the real numbers. 
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The image of the multivariate function FW(-) is a subset of ./#(8). It is straightforward to verify that such subset is 
also closed under the operations of addition and scalar multiplication. Therefore, this subset is a matrix subspace, which 
we refer to as the Feig-Winograd matrix subspace. 

Mathematically, the Feig-Winograd factorization induces a matrix subspace by allowing its multiplicative constants 
to be treated as parameters. Thus, an appropriate parameter selection may result in suitable approximations. Our ultimate 
goal is to identify in this subspace matrices that could adequately approximate the DCT matrix. Hereafter we adopt the 
following notation = FW(q;). 

Considering the mapping described in ([T]l, for any choice of a, satisfies the Feig-Winograd factorization. Thus, 
all matrices in this particular subspace possess the same general fast algorithm structure, which is shown in Fig. [T] 


2.4 Inverse Transeormation 

fct) 

Assuming the existence of the inverse of Tg ', by direct computation, we obtain the following expression: 


4 










(b) Block A 


Figure 1: Signal flow graph for the transformations defined on the Feig-Winograd matrix spaee. Input data n = 

fa) 

0,1,... ,7, relates to output X^, m = 0,1,... ,7, aeeording to X = Tg • x. Dashed arrows represent multiplieations by 
- 1 . 
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However, we notice that the following relationships hold true: 


P,-' =PT, (bJ'V' =diagQ-12,16) 

=diagQ- 14 , 14 ) (BfV' = - 

Thus, straightforward matrix manipulation yields: 

(t(«))-i =b(^) -B^) -b(^) - -PT . Df \ (3) 


where = diag 


Matrices B 


( 1 ) 


1 1 1 i 1 ! 1 

!>2’4’2’8’2’4 




Bg^\ and B^^^ represent butterfly operations. Matrix P^ is a simple permutation. Being a block 
diagonal matrix, its inverse is also a block diagonal. In fact, is closely related to both possessing a very 

similar structure. By means of symbolic computation |531, we obtain that: 


(K' 




where ^ has the structure shown in Q with parameter vector 


a' = 


ttg a; a; a, a; a. 


ock 


given by 


«() = [«o «6^ + {oc 2 ^- «4^) tte + 2 ao «2 «4 + oto^]/A, 
a[ = a\/[al + al], 

a'2 = [0:2 a 4 ^ + (ao^ - « 6 ^) 0:4 + 2 ao 0:2 ote + a2^]/A, 
«3 = V«3, 

a4 = [a4 a-i + (ao^ - «6^) 0:2 - 2 ao a4 ae + a4^]/A, 
aj = a5/[ai^ + a|], 

ttg = [aeao^ + (a2^ - a4^) oq -la^ a4ae + ae^]/A , 


(4) 


(5) 


( 6 ) 


and A = (ao^ + ae^)^ + (a 2 ^ + a4^)^ +4 (aoa 2 - a4a6) (a 2 a 6 + aoa4). 

The inverse of does exist as long as is also well-defined. From the set of equations (|^, the following 

conditions are necessary for the existence of 

(i) aa/O, 

(ii) ai^ + as^/O, 

(hi) Oo^ + at + + ao^ / 0. 
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Applying Q in Q and using ([T]), we directly obtain that: 


(t(o))-t ^p(0) .p3.k(“') -B^^) 

= ( 7 ) 

where = diag (g, g, j, j, j, j) and is the transpose inverse of Tg“^. However, notice that ’Kg” ^ 

is shaped as Q. As a consequence, from (|7]l we can conclude that the fast algorithm for the inverse transformation is 
obtained by simply replacing a of the direct transform by a' (as described in Q) and then applying the transposition 
operation. Thus, the hardware implementation of the inverse transformation is facilitated. 

2.5 Particular Known Matrices 

Based on different values of the parameter vector a, possibly distinct Feig-Winograd approximation matrices can be 
obtained. In this subsection, we furnish a list of several transforms archived in literature, which are particular cases 
encompassed in the proposed Feig-Winograd formalism. Various well-known approximations of DCT matrix C$ are 
among these identified cases, as shown below: 


2.5.1 Exact DCT 


Notice that for ccq = 


1 T 


7i 72 • • • 7? 


, we have that FW(q:o/2) = Cg. 


2.5.2 Signed DCT 

Let sign(-) be the signum function |^. For cti = sign(Q:o) = 
complexity matrix associated to the 8 -point SDCT 1^. 


1 T 


1111111 


, matrix FW(q:i) is the low- 


2.5.3 Level 1 approximation 


1 T 


Lor 0 : 2 = 111111/2 0 , according to ([T]) we have that LW(o 2 ) is the low-complexity matrix of the level 1 

approximation by Lengwehasatit-Ortega 1^ . 


2.5.4 Rounded DCT 


Let round(-) be the round-off function as implemented in C or Matlab language |531. Considering 03 = round(oo) = 


111110 0 


1 T 


LW(o 3 ) is the approximate 8 -point DCT introduced in |481. 


2.5.5 MODIEIED ROUNDED DCT 

1 T 


If O4 = 


1 10 10 0 0 


, then LW(o 4 ) results in the modified rounded 8 -point DCT introduced in |40|. 


2.5.6 DCT APPROXIMATION FOR RL IMAGING 

1 T 


If Qs = 


2 2 11110 


proposed in 


, then LW(a 5 ) is the multiplier-free 8 -point DCT approximation for RL imaging 
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2.5.7 H AAR MATRIX 


Let Hg be the non-normalized Haar matrix of order 8 154 


Hg = •FW(q:6 ) • Pg^'', where Pg^"' = (1)(8 2 5 6 4 3 7), Pg^"* = (1)(8 2 6)(5 3 7)(4) are permutation matrices denoted 

in cyclic notation |[55l p. 64]. 


.( 2 ) 


.( 1 ) - 


p. 159]. For Qg = 
>( 2 ) _ 


0 0 0 1 1 1 0 


1 T 


, we have that 


2.5.8 8-point transform employed in AVC/H.264 

1 T 


If Qy = 


12 8 10 8 6 4 3 


, then FW(q; 7 ) is the integer DCT employed in AVC/H.264 |561. 


2.5.9 8-point transform employed in HEVC/H.265 

1 T 


If Ctg = 


89 83 75 64 50 36 18 


then FW(q:8 ) is the 8-point transform employed used in HEVC/H.265 114 


20 571. 


3 Approximation Criteria in the Eeig-Winograd Matrix Space 




We aim at investigating conditions under which To ^ could provide adequate approximations for the 8-point DCT. To 


guide our search we adopt the following general criteria related to T, 
1 . it must possess low computational complexity; 


(a) 


2 . it must satisfy orthogonality or nearly orthogonality |581; 

3. its inverse transformation must possess low computational complexity; 

4. it must be a close approximation to the exact DCT matrix according to meaningful proximity measures. 

3.1 Computational Complexity 

The computational complexity of the Eeig-Winograd structure is essentially quantihed by its arithmetic complexity, 
which is furnished by the number of multiplications, additions, and bit-shifting operations required for its calculation | 


The multiplicative count is due to the elements a^, /: = 0,1,... ,6 in matrix Kg“^ However, for judiciously selected 
values of cr, the resulting multiplicative count can be lower. One possibility is to restrict the elements = 0,1,... ,6 
to the set of dyadic rationals Q. 

Although this representation may lead to good approximations, it implies an increase in the additive complexity as 
well as in the number of required bit-shifting operations ||^ 601. A more effective approach is to consider only zero 
adder representation quantities Q. In other words, numbers whose binary representation requires no extra adders. This 
is true for powers of two. Aiming at the full computational complexity minimization, we further restricted the choice 
of /: = 0,1,...,6, to the following set of numbers: ^ = {0,± 1/2, ±1, ±2}. In terms of digital arithmetic circuits 
the multiplication by such elements requires no additions and only minimal bit-shifting operations. This implies null 
multiplicative complexity in considered DCT approximations. 

Over the set the worst case scenario in terms of computational complexity is to select non-null parameters in 
{±1/2,±2}. This would imply 28 additions and 22 bit-shifting operations. Considering the already identihed DCT 
approximations in the Eeig-Winograd matrix space, the expected complexity for good approximations may be typically 

















Table 1: Arithmetic complexity of the Feig-Winograd fast algorithm according to the employed number representation 


Number representation 

Mult. 

Add. 

Bit-shifting 

Eloat point 

22 

28 

0 

8-bit dyadic rationals 

0 

42 

21 

Zero adder rationals 

0 

at most 28 

at most 22 


lower. For example, the 8-point SDCT and the rounded DCT—^both in the Feig-Winograd matrix subspace—require 
24/0 and 22/0 additions/bit-shifting operations, respectively |[6 481. 

Expressions ([T]l and Q as well as Fig. [T] suggests that the additive complexity of the Feig-Winograd transformations 

fa) 

is equal to 14 plus the additions confined in matrix . Similarly, the number of bit-shifting operations is fully 
determined by the elements of Let 6(-) and 0 (•) be the functions that return, respectively, the number of non-null 
elements and the number of elements in {±1/2, ±2} of their vector arguments. For a G the addition and bit-shifting 
counts, respectively denoted by s?/(a) and ^(a) are given by: 


j^(a) =14 ±2-max < 1,6 


1 T 


«! as 


^(a) = 2-0 


as 


±2-0 


1 T 


a\ as 


±4-max<J 1,6 
±4-0 


tto 


Ot4 

T 

[-6, 

(8) 

a4 





(9) 


By inspecting above expressions, we notice 14 < s^{oi.) < 28 and 0 < ±^(q) < 22, for a. G 0^''. Thus, the theoretical 
lower-bound for the complexity of the Feig-Winograd matrices is 14 additions. Table[^summarizes the operation counts 
discussed above. 


3.2 Orthogonality or Near Orthogonality 


Orthogonality is often a desirable property sought in a DCT approximation matrix p 431. Among several orthogonal 


(ol) 

ization methods |61 62|, we separate the one based on the polar decomposition |63 641. To orthogonalize Tn , such 


procedure requires only one matrix given by 


Sr = \ [T 


(ct) ('rp(Q:)N-pi J 


where denotes the matrix square root operation |53 661. The resulting orthogonal DCT approximation is furnished 
by gggg[^|6g: 


C8=S8-T(“) 




Importantly, this method preserves the structure and low-complexity of Tg ^ |41 
formation. 


43 


45 


48 


67 


( 10 ) 

and allow lossless trans- 


In the context of image compression, if Ss is a diagonal matrix, then it does not introduce any additional com¬ 
putational overhead. In this case, matrix Ss can be merged into the quantization step of JPEG-like compression 
schemes | [40| - |43|[45l[48| 
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For matrix Ss to be diagonal, it is sufficient that 1651: 


xfa), _ [(jjagonal matrix]. 


( 11 ) 


By explicitly calculating 
for ([TT]) to hold true is: 


r(“)\T 


and using symbolic computation [53 


we obtain that a sufficient condition 


«() • («2 - « 4 ) = ae • («2 + « 4 ). 


( 12 ) 


If satisfies ( [TT| ) or ( [T2l ), then we have that 


Sg = diag(5o,‘yi,‘y2,‘yi,‘yo,‘yi,‘y2,‘yi), 


where sq = \/= 1 /+ «! + oSq)’ ‘^2 = 1 / + 

If matrix does not satisfy ([TT), but • (Tg“^)^ is a nearly diagonal matrix J^ , then Sg is also nearly 
diagonal. Thus, the following approximation for the orthogonalization matrix can be taken into consideration: 


S8 = V{diag[T(“).(T^“^)T]} 




(13) 


where diag(-) returns a diagonal matrix with the same diagonal elements of its matrix argument |69 p. 2]; i.e., diag( 


operates consistently with Matlab usage ||5^. Consequently, the obtained nearly orthogonal approximation is given by 


C8=S8-T 


(a) 


(14) 


To quantify how close a matrix is to the diagonal form, we adopt the deviation from diagonality measure |581, which 
is described as follows. Let M be a square matrix. The deviation from diagonality measure of M is given by: 


5(M) = 1- 


||diag(M)||^ 


| M ||2 


(15) 


where || • ||f denotes the Frobenius norm |61 p. 115]. For diagonal matrices, function 5(-) returns zero. Both the 8- 
point SDCT Q and the BAS approximation proposed in | [44) are nonorthogonal and good DCT approximations. Their 
orthogonalization matrices have deviation from diagonality equal to 0.20 and 0.1774, respectively. Thus we adopt these 
particular measurements as reference values for identifying nearly diagonal orthogonalization matrices in the context of 
DCT approximation. 


3.3 Structure and Complexity of Inverse Transformation 

Not only it is important to identify low-complexity approximations but also to guarantee that the associated inverse 
transformations also possess low computational complexity. 

For orthogonalizable approximations, the following holds true: Cg^^ = C^ = (Tg“^)^ -S^. Therefore, in this case, 

(a) 

the inverse transformation inherits the low-complexity properties of Tg . Moreover, for image compression purposes, 
matrix can also be absorbed in quantization step. 

For the nonorthogonal case, let us assume that Tg“^ is a low-complexity transformation. The set of equations ([^ 
furnishes closed-form expressions for the multiplicative elements a^, k = 0,1,... ,6, of (Tg“^)^'. Thus, for the inverse 
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transformation to possess null multiplicative complexity, it is sufficient to ensure that G for ^ = 0,1,... ,6. 

As an illustration, we consider the 8-point SDCT, which is a nonorthogonal transform. The SDCT low-complexity 

1 T 

We 


matrix is defined on the Feig-Winograd space with parameter vector given by ai = 

1 T 


1111111 


notice that FW(q!i) •FW(q:j) = 8-18, where q.\ = 


2 12 10 10 


satisfies the set of equations Q. 


3.4 Proximity Measures 

In order to evaluate candidate DCT approximations, we consider the following figures of merit: (i) the total error en¬ 
ergy | [48| ; (ii) the mean square error (MSB) |70|; (hi) the unified coding gain |[^7^72|; and (iv) the transform effi¬ 
ciency l[^. The total error energy and MSB are employed to quantify how close a given DCT approximation Ca? is 
to the exact DCT matrix Ca^. The coding gain and transform efficiency capture the coding performance of a given 
transformation Q. 

Bor coding performance evaluation, we assume that the data are modeled as a first-order Gaussian Markov process 
with zero-mean, unit variance, and correlation coefficient p = 0.95 |[^[^[^. Then, the (m,n)-th element of the covari¬ 
ance matrix Rx of the input signal x is given by rm]n = ||^. Natural images satisfy above assumptions ||^. Below 

we briefly describe the selected figures of merit. 


3.4.1 Total Brror Bnergy 

Bach row of a given transform matrix can be understood as finite impulse response filter with associate transfer func¬ 
tions 165 731. Based on Q, the magnitude of the difference between the transfer functions of the DCT and the SDCT 


was advanced in 1481 as a similarity measure. Such measure, termed total error energy, was further employed as a prox¬ 


imity measure between several other approximations |^68|. Although originally defined on the spectral domain Q, 


the total error energy can be given a simple matrix form by means of the Parseval theorem |54 p. 18]. The total error 

|2 
If- 


energy e for a given DCT approximation matrix Cv is furnished by e = tt • ||Ca? — CvP 


3.4.2 Mean Square Brror 

The mean square error (MSB) for an approximation matrix Ca? is defined as |[^|^ 

MSB = ^ • tr ((Cw - Cm) • Rx • (C^ - C^)^ 


where tr(-) is the trace function |74|. 

To maintain the compatibility between the approximation and the exact DCT outputs, the MSB between the DCT 
and the approximation coefficients should be minimized l|^|^. 


3.4.3 Unified Transform Coding Gain 


We adopt the unified coding gain, which generalizes the usual coding gain 1711. Bet h^; and gk be the kth row of Cm and 


C^\ respectively. Thus, the coding gain of Cm is given by: 


C„ = 10-log 


10 


N Y 


(in dB), 


11 





















where = su[(h^ -h^;) oR^], su(-) returns the sum of the elements of its matrix argument |74|, operator o is the element¬ 
wise matrix product | [^ , Bk= ||gy^|| 2 ^ and || • II 2 is the Euclidean norm. 

Approximations exhibiting high transform coding gains can compact more energy into few coefficients Q. The 
transform coding gain for the KLT and 8-point DCT are 8.8462 and 8.8259, respectively Q. 


3.4.4 Transform Efficiency 

fX^ A A -p 

Eet rin^n be the (ni, nj-th entry of the covariance matrix of the transformed signal X, which is given by Rx = Cv • Rx • C^. 


The transform efficiency is defined as |[2 751 


B = 


yN I 

y'N |„(^) 
2^n=l \'m,n 


■ 100 . 


The transform efficiency rj measures fhe decorrelafion abilify of fhe fransform Q . The opfimal KET converfs signals 
info complefely uncorrelafed coefficienfs and has a fransform efficiency equal fo 100, for any value of p. 

4 Optimization over the Eeig-Winograd Space and new transformations 


4.1 Multicriteria Optimization 

In fhis section, we propose and solve an opfimizafion problem considering fhe discussed criferia in fhe previous secfion. 
More formally, we aim af solving fhe following mulficriferia opfimizafion problem ||^ 771: 


argmin(e(Q:),MSE(Q:), —Cg(Q;), — T](q), J 2 /(Q:),,y(Q;)). 


(16) 


In ( [T^ , fhe dependence of fhe proximify measures on fhe parameter vector a is emphasized. Since Cg(Q) and ri(a) 
are to be maximized, we considered fhem in negafive form. We emphasize fhaf all fhe objecfive funclions are equally 
relevanf and Iherefore no ranking over fhemselves is considered. This procedure is widely used in differenl opfimizafion 


problems fhaf consider several objecfive functions |781. 


To address above opfimizafion problem, we musf idenfify fhe search space, fhe sef of consfrainfs, and fhe solving 
mefhod. As we require fhe candidate solutions a fo generafe low-complexify mafrices EW(q;), we have fhaf an- G 
k = 0,1,... ,6. Thus fhe search space for ( [T^ is fhe sef Additionally, we notice fhaf ( fT^ is a consfrained problem. 
In facf, we only consider candidafe solufions whose inverse fransform possesses low arifhmefic complexify. In ofher 
words, bofh direcf and inverse fransformafions require only numerical values defined on fhe sef ^ = {0,±1/2,±1,±2}. 

In view of fhe above resfricfion, we recognize fhaf ( [T^ is nof an analyfically fracfable problem. However, because 
fhe search space confains only 7^ = 823543 elemenfs, exhausfive compufafional search is pracfical. Exhausfive 
search is guaranfeed fo find solufions and could indeed idenfify fhe efficient solufions of ( [T^ fT^ p. 24]. Eef ^ = 
{e(-),MSE(-),—Cg(-),be fhe sef objecfive functions considered in ( [T^ . Each of fhe six values 
given by fhese objective functions provide a vector in Since fhere is no canonical order in M®, is considered fhe 
Pareto ordering fo find fhe values ex fhaf are solufions of ( [T^ , which define fhe sef of efficienf solufions |76|: 


|q:* G : fhere is no a G such fhaf f{cx.) < f{(x*) for all / G and fo{a) < foicx.*) for some fo G . 
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Table 2: Efficients solutions in the Feig-Winograd space 
i Efficient solution (a*) 


1 

[1 1 

1 1 

1 

1/2 

0 

T 

2 

[1 1 

1 1 

1 

0 

0 ] 


3 

[1 1 

0 1 

0 

0 

0 ]' 

4 

[1 2 

0 1 

0 

1 

oj" 

5 

[0 1 

1 1 

1 

0 

0 ]" 

6 

[0 2 

1 1 

1 

1 

0 ]" 

7 

[0 2 

2 1 

1 

1 

0 ]" 

8 

[2 2 

0 1 

0 

1 

1 /2] 

1 

9 

[1 2 

1 1 

1 

1 

0 ]^ 


10 

[1 1 

0 1 

0 

1/2 

0 ] 

11 

[0 1 

1 1 

1 

1/2 

0 ] 


12 

[0 1 

2 1 

1 

1/2 

0 ] 


13 

[0 2 

1 1 

1/2 1 

0 ] 


14 

[0 1 1 

1 

1/2 

1/2 0] 

15 

[210101/2 

1 /2]' 

16 

[1 1 

1 1 

0 

0 

0 ]" 


4.2 Efficient Solutions and New Transformations 


As a result of the computational search, we obtained 16 distinct efficient solutions referred to as a*, / = 1,2,..., 16, as 
shown in Table Each efficient solution implies an approximate DCT matrix given by Tg^ = FW(q:*). The explicit 
form of Tg \ / = 1,2 ,..., 16, is directly obtained from ([T]l. 

We notice that among the obtained efficient solutions, some of them can be linked to orthogonal approximations 
already archived in literature. In particular, we have that (i) T^^ is the level 1 approximation suggested by Eengwehasatit- 


Ortega |^, (ii) T^^^ 
proposed in | |40| |. 

On the other hand, matrices Tg”^^, Tg^\ ..., T 
and consequently lead to orthogonal approximations. 


is the rounded DCT introduced in |48|, and (iii) Tg is the low-complexity DCT approximate 

are new transformations. Except for Tg^^^, all of them satisfy ([TT]) 


4.2.1 Orthogonal APPROXIMATIONS 

A careful examination reveals the following relationships among the orthogonalizable approximations: 


-r(9) _rit'i 

' 8 ~ ^8 ' ‘ 8 ’ 

-r(12) _r»(2) 

'8 ~ ^8 ■ ' 8 ’ 

-r(15) _r|(^) -r(^) 

I Q - Uq * I Q , 


-r(lO) _r|(2) 

'8 ~ ^8 ■ ' 8 ’ 

-r(13) _r|(^) 

I Q - * I O 1 


-r{G) _r|(^) -r(^) 

'8 ~ ^8 ■ ' 8 ’ 

y(14) _ 

'8 ~ ^8 ■ ' 8 ’ 


where = diag(l, 1,2,1,1,1,2,1), = diag (l, 1, i, 1,1,1, i, l), = diag (l, 1, i, 1, i, 1, i), and = 

diag ( 1 , 2 ) 2 ’ 2 ’^’ 2 ’ 2 ’ 2 )' 

As a consequence, although these transformations are pair-wise different, the resulting orthogonal approxima- 
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tions (cf. ( fTO] )) derived from them may not be. This is because the orthogonalization operation described in ( [T0| ) is 
not affected by the left multiplication of a diagonal matrix of positive elements. 


Therefore, in this sense, each of the following sets contains equivalent solutions |79 p. 332] with respect to the DCT 
approximation procedure based on the polar decomposition: 


{h’.h'l.lTg 


’'S JJ\'8>'8 /> 

- 0 ) -r(t3) -p(i4)' 


\' 8’'8 >'8 >'8 /• 


(17) 


( 9 ) 

Thus, for instance, although the new matrix Tg is not explicitly listed in literature, it is equivalent to the Lengwehasatit- 


Ortega approximation tI^^ 1|43|. The remaining matrices found no equivalence to any approximation listed in literature. 


being structurally new. Therefore, we separate tP^, t1^\ 


t'’\ and To°^ as genuinely new transformations. 


-( 8 ) 


4.2.2 Nonorthogonal approximation 

Among the efficient solutions, we obtained only one transformation that leads to a nonorthogonal DCT approximation. 
This particular solution, referred to as tI^^^ is a new transformation and is given by 


t(16) _ 
■ 0 - 


1 1 
1 1 


111111 
0 0 0 0 -1 -1 


1 0 0 -1 -1 0 0 1 
10 - 10010-1 
1 1 - 1-1 1 

1-1010 
00 1-1 0 


1 -1 -1 
0-1 0 
0 -1 1 
LO 0 1 


-1 1-1 0 OJ 


Notice that Tg'^^ does not satisfy conditions ([TT])-([T^. Considering the deviation from diagonality measure discussed 


in 1581 (cf. ([TS])), we have S (Tg^^^ • (Tg^^^)^) ~ comparison, the well-known 8-point SDCT also furnishes a 


near diagonal matrix ||^, whose deviation from diagonality is 0.20. In a sense, matrix Tg^^^ is “more orthogonal” than 

the SDCT low-complexity matrix. Thus, we accept Tg^^^ as a near orthogonal matrix adequate for DCT approximation. 

"^(16) (16) f 16) 

Consequently, the associate nonorthogonal approximation is derived from ([14|) and is given by Sg • Tg , where Sg ^ = 




1 1 1 


IS 


obtained from ([T3|). 


Table summarizes above discussion on the new approximations and their relationships. Notice that all proposed 
approximations allow perfect reconstruction. Most of them allow orthogonal approximation; thus they are lossless trans¬ 


forms |80 p. xi]. Although tI^^^ is a nonorthogonal transform, it permits perfect reconstruction, because it possesses a 


well-defined invertible matrix. 


4.3 Assessment of the New Approximations 

We submitted all obtained efficient solutions to the approximation procedure described in ( fTO] ) and ( fid] ), depending on 
whether a given solution satisfies or not. The derived approximations were assessed by means of (i) arithmetic 
complexity evaluation and (ii) proximity measures with respect to the exact DCT. 

Only non-equivalent transformations were considered, as discussed in ( fTT] ). We also notice that equivalent transfor¬ 
mations possess exactly the same computational complexity. Thus, we only considered the following transformations: 
Tg'^, Tg^\ Tg^^, Tg"^^, Tg^\ Tg^\ Tg^\ Tg^^, and Tg'^^. Table displays the obtained measures. For comparison, result 
values corresponding to the following transformations were also included: the 8-point exact DCT, the 8-point trans¬ 
forms employed in AVC/H.264 |561 and HEVC/H.265 114 20|, the 8-point SDCT Q, and the approximation described 
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Table 3: Efficient Feig-Winograd DCT approximations 
Transform Orthogonalizable? Description 


t(i) 

' 8 

T(2) 

' 8 

-r(3) 

' 8 

t(4) 

y(5) 

t(6) 

T(7) 

t(8) 

t(9) 

t(10) 

t(11) 

t(12) 

' 8 


-(14) 

8 

-(15) 

8 

-(16) 


Yes 

Proposed in 


Yes 

Proposed in ||48 


Yes 

Proposed in 040 


Yes 

New transformation 

Yes 

New transformation 

Yes 

New transformation 

Yes 

New transformation 

Yes 

New transformation 

Yes 

New, but equivalent to Tg'^ 

Yes 

f4) 

New, but equivalent to Tg 

Yes 

New, but equivalent to Tg^^ 

Yes 

New, but equivalent to Tg 

Yes 

New, but equivalent to Tg 

Yes 

New, but equivalent to Tg 

Yes 

New, but equivalent to T® 

No 

New transformation 


in |49|. Although these additional transforms are in the Feig-Winograd subspace, they are not a result of the proposed 


optimization problem detailed in Section]^ 


4.4 Discussion and Comparison 


Measurements shown in Table [^revealed that, among the optimal Feig-Winograd matrices, the transform Tg^^ presents 
superior performance according to all proximity measures. We recall that tI'^ coincides with DCT approximation 


proposed by Fengwehasatit-Ortega which is well-known for being a very good approximation |43|. Nevertheless, it 
is also recognized for its comparatively high computational complexity, which makes it less attractive a tool when 


( 2 ) 

other approximations are considered. Approximation Tg , identified as the rounded DCT |48|, also displays close 


mathematical proximity to the DCT, while demanding lower computational effort in comparison to Tg . Previously 
if Ti (3) 

described in |40 , matrix Tg has the distinction of requiring only 14 additions and no bit-shifting operation, achieving 


the minimal possible arithmetic complexity among all considered Feig-Winograd matrices (cf. ([^, (0). 

(4) 

Regarding the new proposed approximations, matrix Tg possesses good performance in terms of proximity mea¬ 
sures, while requiring only 16 additions. New matrices Tg^^ and Tg'^^ require only 18 additions and can be more closely 
compared. While Tg^^ leads to an orthogonal approximation, Tg^^^ furnishes an nonorthogonal one. Matrix Tg^^^ has the 
smallest total error energy and mean square error among all new transformations. In terms of unified coding gain, the 
orthogonal transform Tg^^ outperforms Tg^^^; on the other hand, when transform efficiency is considered, the nonorthog¬ 
onal approximation excels. 

New efficient transforms Tg^\ Tg^\ and Tg^^ exhibit good coding-related figures, whereas they impose higher com¬ 
putational complexity requirements. At the same time, their additive complexity is not as large as the one required by 
the Fengwehasatit-Ortega (Tg^^) approximation. 

Additionally, Table informs that a trade-off between computational cost and performance is observed in some 
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Table 4: Assessment of Feig-Winograd matrices 


Transform ^(q:) MSE(Q:)Cg(Q;) ri(a) £/(a) ^(a) Mult. 


Optimal transforms 


Ti" 

43 

0.870 

0.006 

8.39 

88.70 

24 

2 

0 

Tf 1 

48 

1.794 

0.010 

8.18 

87.43 

22 

0 

0 

—1 

40 

8.659 

0.059 

7.33 

80.90 

14 

0 

0 

T^^ 

' 8 


7.734 

0.056 

7.54 

81.99 

16 

2 

0 

t(5) 

' 8 

8.659 

0.059 

7.37 

81.18 

18 

0 

0 

t(6) 

' 8 

7.734 

0.055 

7.58 

82.27 

20 

2 

0 

j(7) 

' 8 

7.532 

0.054 

7.56 

82.70 

20 

6 

0 

—1 

7.414 

0.053 

7.58 

83.08 

20 

10 

0 

t( 16 ) 

' 8 

3.316 

0.021 

6.05 

83.08 

18 

0 

0 


SDCT ig 


Approximation in |491 
AVC 

HEVC pM 
DCT 


Non-optimal transforms 


3.316 

0.021 

6.03 

82.62 

24 

0 

0 

0.870 

0.006 

8.34 

88.06 

24 

6 

0 

0.072 

0.000 

8.78 

92.46 

32 

10 

0 

0.002 

0.000 

8.82 

93.82 

28 

0 

22 

0 

0 

8.83 

93.99 

28 

0 

22 


rough sense. To further our analysis, we also examined the following low-complexity transformations which are not 


encompassed in the Eeig-Winograd matrix space: the Walsh-Hadamard transform (WHT) |82| and the series of DCT 
approximations introduced by Bouguezel-Ahmad-Swamy labeled as BAS^^^ |44|,BAS*^^^ |41|,BASf^^ BAS^^^ 146|, 
BAS^^^ 1^1 (for a = 1), BAS^^^ |^| (for a = 0), BAS*-^^ |^| (for a = 0.5), and BAS^*^ |£7 1. Tablej^lists the assessment 
measurements for the above-mentioned approximations. The new proposed matrix Tg^^^ outperforms the BAS series 
approximations in terms of total error energy and MSE. 

To better visualize such balance, we devised scatter plots relating to computational cost with the discussed per¬ 
formance measures. Eig. displays the resulting scatter plots where each transform corresponds to a labeled point. 
Orthogonal transforms are denoted by circle marks and nonorthogonal transforms are indicated by cross marks. Ap¬ 
proximations BAS^^\ BAS^®^ BAS^^\ and BAS^^^ were not included in Eig. ^a) because their measurements were 


Table 5: Assessment of selected non-Eeig-Winograd Transforms 


Transform 

Orthogonalizable? 

e(Q) 

MSE(q:] 


71 {a) 

£/ (a 


BAS(^) 

No 

4.188 

0.019 

6.27 

83.17 

21 

0 

BAS (2) 

Yes 

5.929 

0.024 

8.12 

86.86 

18 

2 

bas(3) 

Yes 

6.854 

0.028 

7.91 

85.38 

18 

0 

BAS(^) 

Yes 

4.093 

0.021 

8.33 

88.22 

24 

4 

BAS(^) 

Yes 

26.864 

0.071 

7.91 

85.38 

18 

0 

BAS(^) 

Yes 

26.864 

0.071 

7.91 

85.64 

16 

0 

BAS(^) 

Yes 

26.402 

0.068 

8.12 

86.86 

18 

2 

BAS(^) 

Yes 

35.064 

0.102 

7.95 

85.31 

24 

0 

WHT 

Yes 

5.049 

0.025 

7.95 

85.31 

24 

0 
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Figure 2: 
son. 


Assessment plots for the proposed efficient approximations; competing methods were included for compari- 


exceedingly large. Similarly, transform was also excluded from Fig.|^b). 

Fig. [^a)-(b) indicates that the Feig-Winograd approximations excel in terms of proximity to the original DCT 
transforms as measured by the total error energy and MSE. Additionally, the Feig-Winograd transforms presented better 
coding-related figures, excepf for fransformafions fhaf require 16 and 18 additions. 

In particular, considering fransformafions fhaf required only 16 addifions, we nofice fhaf fhe proposed approxima¬ 
tion oufperformed BAS^^^ in ferms of fofal error energy and MSE. However, BAS^^^ shows heller coding gain 
performance fhan Now considering fransformafions demanding 18 addifions, we have fhaf approximalions BAS^^^ 
BAS*'^^ BAS^^^ and BAS^’^ showed heller coding gain performance fhan fhe proposed fransformafions and As 

expected, fhe approximalion Tg^^ has fhe besl coding performance among all considered Iransforms in Eig.H Suggesled 

( 2 ) ^ 

approximalion Tg could oulperform non-Eeig-Winograd Iransform in all melrics. 

In ferms of fhe nonorlhogonal Iransforms Tg'^^ and BAS*^^\ we reporl fhaf Tg^^^ possesses fhe smaller fofal error 
energy when compared lo BAS^^^ while presenting a similar performance. However, Tg'^^ is 14.3% less compulafionally 
complex. Also, as discussed in | [7T| , if is generally expecfed fhaf nonorlhogonal Iransforms presenl lower values of coding 
gain compared wilh orlhogonal Iransforms (cf. Eig.j^c)). This is expected since coding gain measures are optimized for 
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the KLT 1721, which is orthogonal. 


We also notice that Figs. 2(a)-(d) can be interpreted as results of four different bi-objective optimization prob¬ 
lems Czl p. 245], where the trade-off between computational cost and performance measures are emphasized. The 


optimal solutions of the bi-objective optimization problems are situated on the boundary of set of feasible solutions |76 


p. 28]. Such boundaries are shown in Fig.j^and roughly represent the Pareto frontiers for each problem | |77| p. 11]. Thus, 
in this sense, we separate the transforms situated on each Pareto frontier as optimal transformations. Such transforms 
Tg'^, T^^^, T^^ T^‘^\ BAS^'\ BAS^^\ BAS^^\ and BAS^^^. This reduced set of transforms was submitted 


were: 


8 


to subsequent image compression analysis. 

However, notice that although the rem; 
of a DCT approximation is highly dependent on the envisioned task. Thus, these transforms may be adequate in other 


However, notice that although the remaining transforms Tg^\ Tg^^, ..., Tg^^^ were outperformed, the performance 


contexts, as suggested in |471. 

In terms of comparisons with other methods, we notice that current video standards AVC/H.264 p3][56| and 
HEVC/H.265 |^57 811 employ integer DCT approximations, which are indeed encompassed in the Feig-Winograd 
matrix subspace. However, these integer matrices do not satisfy the low-complexity restriction on the matrix elements— 


namely having its parameter vector defined over the set 0^ = {0, ±1/2,±1,±2}, as shown in Section 2.5 Possessing 
large elements, the implementation of such matrices requires more sophisticate operation schemes which make them 
more computationally expensive | [5^[57|[M] when compared to the extremely low-complexity approximations discussed 
here. For instance, in terms of additive complexity, the AVC requires from 50.00% to 128.57% more operations in com¬ 
parison with the optimal transformations, as showed in Table On its turn, HEVC requires multiplication operations, 
which are much more expensive than simple additions | |83| . For a fair comparison, we restricted our subsequent analyses 
to the set of very low-complexity matrices whose elements are in 


5 Image Compression 

In order to evaluate the performance of the selected approximations in image compression, we adopted a JPEG-like 
computational simulation 2 T|24}40] -42] 44 - |49l6^84) based on a set of 45 512x512 8-bit greyscale standard images 
obtained from a public image bank |[85|. Images were split into 8x8 sub-blocks, which were submitted to a given 


2-D transformation 861 depending on the considered DCT approximation. The 2-D transform operation depends on 
whether the considered transform is orthogonal or not. Eet A be a 8 x 8 sub-block of the considered image. In general. 


the 2-D approximate DCT of A be written as 1861: 


B 


Cs • A • C J, if satisfies ([U]), 

Cg • A • C7\ otherwise. 


This computation furnished 64 coefficients in the approximate transform domain for each sub-block. According to the 
standard zigzag sequence 1[8|, only the r initial coefficients in each block were retained and employed to reconstruct the 
image p^ . All the remaining coefficients were set to zero. We adopted 1 < r < 45. Based on 8-bit images, this approach 
implies in the fixed bitrate equals to r/8 bpp. After compression, the inverse 2-D transform was applied to reconstruct 
the processed data. Subsequently, image quality was evaluated. 

Above outlined methodology is also described in Q and supported in However, in contrast to the 

JPEG-based image compression experiments described in |[6p4^|42||44|-|46), we adopted average image quality measures 


from the entire image set |40 48 ^ instead of the results obtained from particular selected images. Thus, our approach 
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is less prone to variance effects and fortuitous input data, being more robust a methodology |871. 

Image degradation was assessed by means of the peak signal-to-noise ratio (PSNR) | [M| and the structural similarity 
index (SSIM) | [89| . The PSNR is a quality measure widely used in image processing and the SSIM is regarded as a 
complementary method for image quality assessment 


In fact, the SSIM considers luminance, contrast, and image 
structure to quantify image degradation, being consistent with subjective quality measurements | [90| . 

Figs. I^a)-(b) show the obtained quality measures for the transforms identified as optimal according to the discussion 
at the end of the preceding section. This type of performance curves is commonly employed as a comparison tool in 
approximate DCT literature |^^44 45 For an improved visualization of the performance curves, we considered 
the absolute percentage error (APE) relative to the DCT of both the PSNR and SSIM measurements. The resulting values 
are plotted in Figs, [^c)-(d). To further validate our methodology, we also computed additional descriptive statistical 
measures of the results. In particular, the coefficient of variation | [^ was found to be consistently less than 16%, for all 
data sample. This fact suggests that adopting average values for the metrics is indeed an adequate approach, as supported 
in 


p. 155] and |93 p. 387]. 


As expected, among all considered methods, tI^^ possesses the best performance in terms of image quality at the cost 


and Tg^^^ also performed better than BASin terms of PSNR, for r>25 (bitrate larger than 3.125 bpp), and of SSIM, for 
all values of r. The new nonorthogonal T^^^^ outperformed T^^^ and T^^^. However, transformation T^'^^ showed a lower 
arithmetic complexity when compared to the also nonorthogonal BAS^^^. Proposed approximation Tg"^^ surpassed T^^^ 
in both PSNR and SSIM measurements for all compression ratio, requiring only two extra additions for its computation. 

Fig. I^shows 512x512 Lena image after being submitted to the described JPEG-like compression, which provides a 
qualitative comparison for r = 25 (3.125 bpp). This corresponds to discarding 60% of the approximate DCT coefficients. 
PSNR and SSIM measurements are included for comparison. 

6 Conclusion and final remarks 


( 2 ) 

of the highest computational complexity 144-46 . Eeig-Winograd approximation Tg could outperform transformations 
Tg^^, Tg^^, Tg'^\ and BAS^^\ while exhibiting similar performance to BAS^^^ and BAS^^^. Transformations Tg^^, Tg"^^ 


We introduced a new class of matrices based on a parametrization of the Eeig-Winograd DCT factorization. By solving 
a constrained multicriteria optimization problem, several low-complexity DCT approximations were obtained. Among 
the obtained solutions, we identified DCT approximafions already reporfed in liferafure. Thus our procedure furnishes 
a mafhemafical framework fhaf mathematically unifies fhem. Moreover, we derived novel DCT approximafions. The 
new fransformafions were assessed in ferms of compufafional complexify, DCT mafrix proximify, coding gain, and 
performance in JPEG-like compression. The DCT approximafions in fhe Eeig-Winograd mafrix subspace exhibifed 
roughly a Irade-off befween cosf and performance. The compufafional complexify of fhe proposed fransforms ranged 
from 14 fo 24 additions and from 0 fo 2 bif-shiffing operafions. If is worfh fo nofe fhaf all fransforms in fhe Eeig-Winograd 
class possess the same algorithm structure. Eurthermore, the associated inverse transforms share similar mathematical 
formalism and possess simple fast algorithms. Thus, in terms of circuitry design, one can interchange transforms with 
minimal hardware modifications. In emerging reconfigurable systems, it may be possible to switch modus operandi 
based on the demanded picture quality and required energy consumption. Thus, the proposed class of approximations 
may be a candidate suite of fast algorithms in such context. Besides the image compression context, the Eeig-Winograd 
class of DCT approximations can be applied for data encryption following the scheme introduced in |^95|. Einally, 
we remark that the proposed method is sufficiently flexible fo be exfended fo large blocklengfhs fhaf are powers of fwo, 
according fo fhe Eeig-Winograd general fheory for DCT faclorizafon pT| p. 2188]. 
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Figure 3: Quality measures of seleeted optimal approximations for several values of bpp according to the following 
figures of merit: [^Average PSNR, [^Average PSNR absolute percentage error relative to the DCT, |(b)| Average SSIM, 
and|(d)|Average SSIM absolute percentage error relative to the DCT. 
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(a) (PSNR=35.176, (b) jf' (PSNR=34.138, (c) (PSNR=31.838, (d) (PSNR=32.299, (e) (PSNR=31.602, 

SSIM=0.995) SSIM=0.989) SSIM=0.970) SSIM=0.977) SSIM=0.985) 



(f) BAS(i) (g) BAS(2) (h) BAS(®) (i) BAS^^) Q) DCT (PSNR=37.886, 

(PSNR=31.452, (PSNR=34.794, (PSNR=33.819, (PSNR=35.433, SSIM=0.997) 

SSIM=0.980) SSIM=0.991) SSIM=0.986) SSIM=0.995) 


Figure 4: Compressed Lena image using (a) 


(b) 


T 


( 2 ) 


BAS('^),|^BASC^\ and|^DCT, for r = 25 (3.125 bpp). 


(c) 


-(3) 


(d) 


T 


(4) 


(e) (f) BAS(i), (g) BASP), (h) 
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